\documentclass[]{article}

% Define page
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}

% Include packages for text modification
\usepackage{amsmath,amsfonts}
\usepackage{bm}

% Include packages for citing
%\usepackage[noadjust]{cite}
\usepackage{natbib}%http://merkel.zoneo.net/Latex/natbib.php

% Include packages for graphics
\usepackage{graphicx}
\usepackage{subfigure}

% Define new commands
\newcommand{\etal}{\emph{et al.}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\title{Kappa statistic}

\author{Marius Staring}
\date{}
\maketitle

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% main text

\section{Introduction}

The kappa statistic is a measure of agreement between observations.
Several definitions of the kappa statistic exist, which are not all
equivalent. In 1960 Cohen \cite{Cohen60} introduced a measure of
nominal scale agreement, which was defined for two observers and
where disagreement was weighted without taking into account the
distance between categories. Cohen \cite{Cohen68} extended this
measure in 1968 to a weighted kappa statistic $\kappa_w$ for two
observers. Fleiss \cite{Fleiss71} introduced an unweighted kappa
statistic $\kappa$ for 2 or more observers. The Fleiss kappa
statistic for two observers is not equivalent to the Cohen weighted
kappa statistic with an identity weighting. No weighted kappa
statistic exists for more than two observers.

In Section \ref{sec:Fleiss} a mathematical definition of the Fleiss
kappa is given, followed by definitions of the Cohen unweighted and
weighted kappa (Section \ref{sec:Cohen} and \ref{sec:CohenWeighted},
respectively).

The following notation is used:

\begin{table}[h]
\begin{tabular}{cl}
$N$ & the number of cases / observations \\
$n$ & the numbers of raters / observers \\
$k$ & the number of categories in which a case can be rated \\
\end{tabular}
\end{table}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Fleiss' kappa}\label{sec:Fleiss}

The definition of the Fleiss kappa is based on an $N$ by $k$
observation table or matrix in which the elements $n_{ij}$ represent
the number of observers who assigned the $i$-th case in the $j$-th
category. Then (from \cite{Fleiss71}):
\begin{align}
p_j &= \frac{1}{N n} \sum_{i = 1}^N n_{ij}, \\
P_i &= \frac{1}{n(n-1)} \sum_{j = 1}^k n_{ij} (n_{ij} - 1) =
\frac{1}{n(n-1)} \left( \sum_{j = 1}^k n_{ij}^2 - n \right), \\
P_o &= \frac{1}{N} \sum_{i = 1}^N P_i, \\
P_e &= \sum_{j = 1}^k p_j^2,
\end{align}
where $p_j$ is the proportion of all assignments to the $j$-th
category, $P_i$ is the extent of agreement among the $n$ observers
for the $i$-th subject / case, $P_o$ the observed overall agreement,
and $P_e$ the expected mean proportion of agreement due to chance.
The degree of actually attained agreement in excess of chance $P_o -
P_e$, normalised by the maximum agreement attainable above chance $1
- P_e$, defines the kappa statistic:
\begin{align}
\kappa &= \frac{P_o - P_e}{1 - P_e}.\label{eq:kappa}
\end{align}

The approximate standard deviation of $\kappa$ is:
\begin{align}
\mathit{std}(\kappa) &= \sqrt{ \frac{2}{Nn(n-1)} \frac{\sum_{j = 1}^k
p_j^2 - (2n-3)\left( \sum_{j=1}^k p_j^2 \right)^2 + 2(n-2)
\sum_{j=1}^k p_j^3}{ \left( 1 - \sum_{j=1}^k p_j^2 \right)^2} }
\end{align}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Cohens kappa}\label{sec:Cohen}

Cohen starts by defining a $k$ by $k$ confusion matrix, in which an
element $f_{ij}$ defines the number of cases that the first observer
assigned a particular case to category $i$ and the second to $j$. So,
$f_{jj}$ is the number of agreements for category $j$. Then (from
\cite{Altman91}):
\begin{align}
P_o &= \frac{1}{N} \sum_{j = 1}^k f_{jj}, \\
r_i &= \sum_{j = 1}^k f_{ij}, \forall i, \text{ and }
c_j = \sum_{i = 1}^k f_{ij}, \forall j, \\
P_e &= \frac{1}{N^2} \sum_{i = 1}^k r_i c_i,
\end{align}
where $P_o$ the observed proportional agreement, $r_i$ and $c_j$ the
row and column totals for category $i$ and $j$, and $P_e$ the
expected proportion of agreement. The final measure of agreement is
again given by Equation (\ref{eq:kappa}).

The approximate standard deviation of $\kappa$ is:
\begin{align}
\mathit{std}(\kappa) &= \sqrt{ \frac{P_o ( 1 - P_o )}{N ( 1 - P_e
)^2} } = \sqrt{ \frac{P_e + P_e^2 - \frac{1}{N^3} \sum_{i=1}^k r_i
c_i (r_i + c_i)}{N ( 1 - P_e )^2}}.
\end{align}
The left root was found in \cite{Altman91} and the right root was
adapted from \cite{LandisEA77} (using $p_{ij} = f_{ij} / N$).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Cohens weighted kappa}\label{sec:CohenWeighted}

Cohens kappa as defined above rates assignments with a one category
difference equally bad as two or more category different. This is
sometimes undesired. Cohens weighted kappa $\kappa_w$ is derived from
the normal kappa by including a weight function $w$. If $w$ is chosen
the identity matrix $I_k$, then Cohens $\kappa_w$ is identical to
Cohens $\kappa$. A linear weight is commonly chosen, which is
calculated as:
\begin{align}
w_{ij} &= 1 - \frac{|i-j|}{k-1}.
\end{align}
Alternatively a quadratic weight could be used: $w_{ij} = 1 - (i-j)^2
/ (k-1)^2$. Then:
\begin{align}
P_{o(w)} &= \frac{1}{N} \sum_{i = 1}^k \sum_{j = 1}^k w_{ij} f_{ij}, \\
P_{e(w)} &= \frac{1}{N^2} \sum_{i = 1}^k \sum_{j = 1}^k w_{ij} r_i
c_j,
\end{align}
with $r_i$ and $c_j$ again the row and column sums. The weighted
kappa statistic is now given by:
\begin{align}
\kappa_w &= \frac{P_{o(w)} - P_{e(w)}}{1 -
P_{e(w)}}.\label{eq:kappa_w}
\end{align}

The approximate standard deviation of $\kappa_w$ is (from
\cite{Fleiss81}):
\begin{align}
\mathit{std}(\kappa_w) &= \sqrt{ \frac{ \frac{1}{N^2} \sum_{i=1}^k
\sum_{j=1}^k r_i c_j (w_{ij} - \bar w_{i.} - \bar w_{.j})^2 -
P_{e(w)}^2}{N ( 1 - P_{e(w)} )^2}},\label{eq:stdCohenkw}
\end{align}
with $\bar w_{i.} = \frac{1}{N} \sum_{j=1}^k r_{.j} w_{ij}$ and $\bar
w_{.j} = \frac{1}{N} \sum_{i=1}^k c_{i.} w_{ij}$, and $p_{ij} =
f_{ij} / N$.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Comparing kappas}

From \cite{Fleiss81}. In order to compare two kappa's, i.e. for
testing the hypothesis that the underlying value of weighted kappa is
equal to a prespecified $\hat \kappa$ or $\hat \kappa_w$ other than
zero, the appropriate formula for the standard error of $\kappa$ or
$\kappa_w$ are given below.

\textbf{For Cohens kappa} this is given by:
\begin{align}
\mathit{std}(\kappa) &= \frac{1}{\sqrt{N} ( 1 - P_{e(w)} )} \sqrt{
\frac{1}{N} \sum_{i=1}^k f_{ii} \left[ 1 - \frac{1}{N} ( r_i + c_i )
( 1 - \kappa ) \right]^2 + ( 1 - \kappa )^2 \sum_{i = 1}^k \sum_{j =
1, i \neq j}^k f_{ij} ( r_i + c_j )^2 - \left[ \kappa - P_e ( 1 -
\kappa ) \right]^2}.
\end{align}

\textbf{For Cohens weighted kappa} this is given by:
\begin{align}
\mathit{std}(\kappa_w) &= \frac{1}{\sqrt{N} ( 1 - P_{e(w)} )} \sqrt{
\frac{1}{N} \sum_{i=1}^k \sum_{j=1}^k f_{ij} \left[ w_{ij} - (\bar
w_{i.} + \bar w_{.j} ) ( 1 - \kappa_w ) \right]^2 - \left[ \kappa_w -
P_{e(w)} ( 1 - \kappa_w ) \right]^2}.
\end{align}
Compare with Equation (\ref{eq:stdCohenkw}). The hypothesis may be
tested by referring the value of the critical ratio:
\begin{align}
z &= \frac{|\kappa_w - \hat \kappa_w|}{\mathit{std}(\kappa_w)}
\end{align}
to tables of the standard normal distribution and rejecting the
hypothesis if the critical ratio is too large.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bibliographystyle{unsrt}
\bibliography{kappa}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}
